How Caching Affects Hashing

نویسندگان

  • Gregory L. Heileman
  • Wenbin Luo
چکیده

A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an ever-widening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact lead to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name “hashing itself is meant to convey this notion [12]. However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance tradeoffs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly “chaotic forms of double hashing, using data sets that are justified through information-theoretic analyses. Our results, contrary to those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, linear probing is usually no better than, and in some cases performs far worse than double hash functions that spread data more randomly through the table. ∗We wish to thank to Bernard Moret for suggesting this topic to us. Thus, for general-purpose use, a practitioner is well-advised to choose double hashing over linear probing. Explanations are provided as to why these results differ from those previ-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Caching with Consistent Hashing

A key performance measure for the World Wide Web is the speed with which content is served to users. As traffic on the Web increases, users are faced with increasing delays and failures in data delivery. Web caching is one of the key strategies that has been explored to improve performance. An important issue in many caching systems is how to decide what is cached where at any given time. Solut...

متن کامل

Asymptotic Miss Ratio of LRU Caching with Consistent Hashing

To efficiently scale data caching infrastructure to support emerging big data applications, many caching systems rely on consistent hashing to group a large number of servers to form a cooperative cluster. These servers are organized together according to a random hash function. They jointly provide a unified but distributed hash table to serve swift and voluminous data item requests. Different...

متن کامل

DISH - Dynamic Information-Based Scalable Hashing on a Cluster of Web Cache Servers

Caching web pages is an important part of web infrastructure. The effects of caching services are even more pronounced for wireless infrastructures due to their limited bandwidth. Medium to large-scale infrastructures deploy a cluster of servers to solve the scalability problem and hot spot problem inherent in caching. In this report, we present Dynamic Information-based Scalable Hashing (DISH)...

متن کامل

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

This paper presents a set of architecturally and workloadinspired algorithmic and engineering improvements to the popular Memcached system that substantially improve both its memory efficiency and throughput. These techniques—optimistic cuckoo hashing, a compact LRU-approximating eviction algorithm based upon CLOCK, and comprehensive implementation of optimistic locking—enable the resulting sys...

متن کامل

A Study of the Performance and Parameter Sensitivity of Adaptive Distributed Caching

A self-organized approach to manage a distributed proxy system called Adaptive Distributed Caching (ADC) has been proposed previously. We model each proxy as an autonomous agent that is equipped to decide how to deal with client requests using local information. Experimental results show that our ADC algorithm is able to compete with typical hashing based approaches. This paper gives a full des...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005